Character encodings in HTML について

Words near each other

・ Character changes during Infinite Crisis
・ Character class
・ Character class (Dungeons & Dragons)
・ Character comedy
・ Character creation
・ Character dance
・ Character dancers
・ Character design
・ Character design of Final Fantasy
・ Character development
・ Character displacement
・ Character editor
・ Character education
・ Character Education Partnership
・ Character encoding
・ Character encodings in HTML
・ Character evidence
・ Character evolution
・ Character flaw
・ Character generation
・ Character generator
・ Character Generator Protocol
・ Character group
・ Character interval
・ Character Is Destiny
・ Character large object
・ Character literal
・ Character Map
・ Character map
・ Character mask

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Character encodings in HTML ：ウィキペディア英語版

Character encodings in HTML

HTML (Hypertext Markup Language) has been in use since 1991, but HTML 4.0 (December 1997) was the first standardized version where international characters were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit ASCII two goals are worth considering: the information's integrity, and universal browser display.
==Specifying the document's character encoding==
There are several ways to specify which character encoding is used in the document. First, the web server can include the character encoding or "charset" in the Hypertext Transfer Protocol (HTTP) Content-Type header, which would typically look like this:
Content-Type: text/html; charset=ISO-8859-4
This method gives the HTTP server a convenient way to alter document's encoding according to content negotiation; certain HTTP server software can do it, for example Apache with the module mod_charset_lite.〔(Apache Module mod_charset_lite )〕
For HTML it is possible to include this information inside the head element near the top of the document:〔

HTML5 also allows the following syntax to mean exactly the same:

XHTML documents have a third option: to express the character encoding via XML declaration, as follows:

Note that as the character encoding can't be known until this declaration is parsed, there can be a problem knowing which encoding is used for the declaration itself. The main principle is that the declaration shall be encoded in pure ASCII, and therefore (if the declaration is inside the file) the encoding needs to be an ASCII extension. In order to allow encodings not backwards compatible with ASCII, browsers must be able to parse declarations in such encodings. Examples of such encodings are UTF-16BE and UTF-16LE.
As of HTML5 the recommended charset is UTF-8.〔 An "encoding sniffing algorithm" is defined in the specification to determine the character encoding of the document based on multiple sources of input, including:
# Explicit user instruction
# An explicit meta tag within the first 1024 bytes of the document
# A Byte order mark within the first three bytes of the document
# The HTTP Content-Type or other transport layer information
# Analysis of the document bytes looking for specific sequences or ranges of byte values,〔(HTML5 prescan a byte stream to determine its encoding )〕 and other tentative detection mechanisms.
For ASCII-compatible character encodings the consequence of choosing incorrectly is that characters outside the printable ASCII range (32 to 126) usually appear incorrectly. This presents few problems for English-speaking users, but other languages regularly—in some cases, always—require characters outside that range. In CJK environments where there are several different multi-byte encodings in use, auto-detection is also often employed. Finally, browsers usually permit the user to override ''incorrect'' charset label manually as well.
It is increasingly common for multilingual websites and websites in non-Western languages to use UTF-8, which allows use of the same encoding for all languages. UTF-16 or UTF-32, which can be used for all languages as well, are less widely used because they can be harder to handle in programming languages that assume a byte-oriented ASCII superset encoding, and they are less efficient for text with a high frequency of ASCII characters, which is usually the case for HTML documents.
Successful viewing of a page is not necessarily an indication that its encoding is specified correctly. If the page's creator and reader are both assuming some platform-specific character encoding, and the server does not send any identifying information, then the reader will nonetheless see the page as the creator intended, but other readers on different platforms or with different native languages will not see the page as intended.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Character encodings in HTML」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース